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ABSTRACT 


It 1s natural to associate optimal estimation and opti- 
mal statistical testing. In this paper continuous function 
estimation of the cumulative distribution is used to define 
two test statistics that compete with the Kolmogorov D 


N 


statistic. The first statistic, C 1s attributed to Pyke 


N/ 
and the second, Ry is obtained by polygonalizing the sample 
distribution function. It is Known that both are asymp- 
totically equivalent to the Kolmogorov statistic. Using 

the methods of J. Durbin, the small sample distributions 

are tabled as well as the critical points for significance 
levels of .20, .10, .05, .025, and .0Ol1. It is Siown that 


Ry 1s stochastically smaller than Dy and it appears that 


Cy is also smaller than Dy: 
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L. IN ERODUGTILON 


The popular method for testing the simple hypothesis 
Ho? a Pos where F is the unknown cumulative distribution 
of the population sampled and Fy is the proposed continuous 
distribution of the sample, is the Kolmogorov Dy statistic 


[Ref. 4]. The asymptotic distribution of D,, was developed 


N 
by Kolmogorov [Ref. 2] and tabled by Smirnov [Ref. 6]; the 
small sample distribution was tabled by Massey [Refs. 3 and 
4] and has been treated recently (among others) by Durbin 
[Ref. 1]. A modification of the Kolmogorov test has been 
proposed by Pyke [Ref. 5] and the distribution of this test, 


Salled C can be obtained from Durbin [Ref. 1]. 


Nie 


The Kolmogorov D,, statistic is associated with a step 


N 
function estimation of F in the computation of the test 
statistic. This function, however, is from a vacuous class 
under the assumptions of the method, since a priorily F is 
assumed to be continuous. Of course the estimator converges 
to the continuous function F as the sample size becomes 
large without limit. 

The question of estimating the distribution F is sep- 
arate from testing Ho. However, it is natural to associate 
the two and to expect optimal estimation to yield optimal 
testing methods. Two of the statistics considered in this 
paper may be viewed as being based upon continuous esti- 
mators of F. It is shown that each is asymptotically 


equivalent to Dy 


The statistics investigated are presented in section 
two. The asymptotic nature of each and the relation between 
each is shown in section three. Section four presents the 
computational method used in tabling the distribution of the 
statistics introduced in section two. Final comparisons 
are covered in section four. The tables are in the 


appendices. 


Tt. BASIC DEFINITIONS, KOMMOGOROV STATISTIC, PYKE 


STATISTIC, AND A PROPOSED STATISTIC 


Let Y),Yorn--s Yay be a random sample from a continuous 
population with cumulative distribution function F(y). The 


methods presented below are for testing the simple hypothesis 


The procedures discussed are based on the order statistics 


of the sample where 


=o ke = Xo 5 - = e 


are the ordered sample points, and the sample cumulative 


distributions is 


0 x < xy 
Puy) = 4 4/N x. ax < Xe 
il X > Xn 


The Kolmogorov Diy statistic for testing He 1s 


De =saup™ | & 
x 


e (x) - F(x) | 


N 


and has the property of being distribution free under HY 
[Ref. 4]. This property allows for computation of the cum- 
Udi 1 wemdies tasmiomi onmof Dd. to be done with F assumed to be 


uniform (0,1) without loss of generality. 


_ max | Se es 
Lemma 1 : Dy “1e4eNn (|X, (3/N) |, 1X, (j-1)/N|}. 


Proof: After transLtormation Under Ho the statistic 


becomes 


sup 


DN = O<x<l 


Pe — Mel XC) 


The assertion is obvious from Figure one. 
A modification of the Kolmogorov statistic 1s attributed 


to Pyke [Ref. 5] where the test statistic is 


_ max  e 
Cy = 1<4<N |X, j/ (n+1) | 


Implicit in this method is the idea that the has order 
statistic estimates the ((j+1)/N) (100) percentile of F. 
Such estimators are unbiased when ye is unlifomm (0m))] 

The statistic C,, may be viewed as being based on a 
continuous function estimator of F constructed by connect- 
ing the points (0,0); (Xp 1/ (N41) ) 7.063 (X,N/(N+1)); 

(1,1) with straight line segments. This iS apparent since 
any polmt on the line segment L from oat ay4 UN ae cl) lee. 

ab (j+1)/(n+1)) is a convex combination of the endpoints 
and hence the maximum separation between points on the line 
Y =X and L must occur at one of the endpoints of L. 

It is of interest to consider a sample cumulative dis- 
ie bataon formed by «connecting the points (0,0); 

(X),1/N) i. -ei (X,1) with straight line segments and the 


resultant statistic 


La 


Z ft. 


yo 


Piugure il. 
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|(j-1)/N - x. | 


Lemma l 


FS (x) 


where Sx (X) 


is the sample cumulative distribution modified 
as described above. Properties of Ry are presented in 
section three and its distribution for finite N is tabled 


in Appendix B. 
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TIT. ASYMPTOTIC EQUIVALENCE OF THE TEST S@ATIST&Cs 


It is well known that 


| = r-1 y.2 

ium. Ply Dei 2) = 2 2a exp {-2r"z"} 
N 

N7>o© ie Ll 


[Ref. 2] and it is shown below that bothVN Ry and YN Cu 
have the same limiting distribution. 

Lemma 2: For all N and all F Ry is stochastically 
smaller than Dus ao ne Ry <Dy- 


Proof: By lemma one 


Digest PX. 


e* , 7 NI, 1X, - G-l Nh 


and it follows from the definition that 
max 
= eae c= Mago Dd. « 
Ry = 3) {1X5 > I/NI 
The result is immediately obvious. The latter is maximized 
over a subset of those for which the former is maximized. 
Lemma 3: For all e€>0 
el PNG te Re ele) aL 


Proof: Consider 


= Neer || Xo 


ame: 3 7 37 (N+1) | 


max |X, - j/N + j/N - 35/(N+1) | 


[A 
~ 
[ 


j/N| + 1/(N+) = Ry + 1/(N+1) 


3 


Thus Cy - Ry cl Nala) 


Now consider 


_ max - 
Ry = 3 |X, 3/N | 

7 “a Re - 3/(N+1) + 3/ (N41) -'3/N] 

< oa IX, - j/(N+1)| + 1/ (N41) = GC + 1/ (N41) 


Combining the two it is seen that 


WN JR, - Cyl < YN/ (N41) 


<I 
Thus 


ay eee reece | = 1 


t@igual | Nesowlanrge that 
YN /(N+1) < € 
Lemma 4: For all €>0 it will be shown that 


avn JS NT | Deas acs ill 


N+ 


Max 


BOQ We Qn ~ 1<j<N |X, - (j-1)/N| and note that 
= = Wax =< en 
On W1<4<n x, J / iran tee (pa eeN ReSeee 17 


Thus 
WN |Q,- Ryl < VN. 
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Also 


Saitek NR OQ. 
mene «| ho al = |max {0,(Q, = 2 joe - Ry | and 


it follows that 


tbl (Y/N | 0st ol sees 
N7o 
Boreal! N so large that vN = Ie 
a r-1 
252) 


Lemma 5: Lim P[vYN Ree 2) S202 (Seo ae 
N>7o© r= | 


Proof: P[VN Di, >i] = P[VN aes YN (Dy = Ryle al 


Ee Pal yh ar JN Qo Ee Z,) = Lim Bigs Ry > 21. 


N>© N>© 


Ik 


IV. COMPUTATIONAL METHOD 


The method used to evaluate the cumulative distribution 


function of the statistics D G and Ry 1s from the wous 


N’ ~N’ 
©f Durbin [Ref. 1]. A sketch of his results and the modz#ae 
cations necessary for the statistics under consideration are 
presented in this section. 


"Suppose thac 0 = Kp SX, Lue. 5 XY S 1 is an ordered 


random sample of independent observations from a uniform 


(0,1) distribution. The sample distribution function is va 
Bo (x) as defined in section two. "Let S denote the sample 
pacheesf Py (x) as x moves from 0 to l. In this paper we con- 


Sider the probability p, (a,b,c) that S lies entirely in the 
region R between the lines ny = at+(nt+c)x and ny = -b+(ntc)x, 


(a>0, b>0O, atb>0, and a-c>0)."+ 


beriune: 
1-6 1 0: Sera 0 
(aR) 2. il Le ec 0 
M6223 1/2! i 0 
lel = 
as ga ace eek 
P! (P-1)! inal a 





Durbin, J., "The Probability That the Sample Distri- 
bution Function Lies Between Two Parallel Straight Lines." 
The Annals of Mathematical Statistics, V. 39, p. 398, 
April 1968. 


16 


[z] = the greatest integer in Zz. 
1-6 = (b-c) - [b-c] 
mere = (a-C)e— Vha—-c] 
P = [b-c] + [atc] + 1 
h  =80 if 6+@ <el and h = (Once eee eoereT 
q = the ([b-c + 1], [bt1]) eehetient vet erheematrrx< 
ylN+c] 
N! 
Thus p, (a,b,c) = Se ae [Ref . “ele 
(N+c) 
Lemma 6: P[D, < k/N] = p, (k,k,0) [Ref.1]. 
Lemma 7: P[C,, < k/(N+1)] = p, (k,k+1,1) [Ref. l]. 


Lemma 8: P[Ry 


JA 


k/N] = p(k ,k+1,0) 


Proof: Assume the sample cumulative distribution, 


Sx (X) has stayed within the bounds shown in Figure two 


until it arrives at the point ae The next observa- 


meron, X must occur before Xs for non=rejectiongby 


441! 1 


the Dy statistic. For Ry to accept the sample, the next 


observation must occur before ae - Since all steps are of 


R 


height 1/N the point eh aust (j+IT)/N) is in the regie@p kK if 


the lower band of the equivalent Dy statistic is shifted 


downward by 1/N. The resultant band is ny = -(btl) + nx. 


Eiememne pormt (X (j+1)/N) were to fall in the region W 


ajar dk 
for either D., or Ry the hypothesis, Hoe would be rejected. 
Hence the upper bound for the region R is unchanged. 

The computer program used to table the distribution of 


Cu and Ry is presented in the rear of this thesis. As 


ae / 








NY = 


Lemma 


O 
Say (X) 
N= 
ee ee gk 
N 
j #--—---—-—--—- -—-- 
N f 
| 
Xx D 
3 Kat] 
Figure 2. 
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A + NX 


NY = B + NX 


8 


written the parameter P is restricted to be less than or 
equal to forty. The subroutine DURMAT is used to keep the 


matrix products within the range of the IBM 360/67 used. 
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V> GOMPARISON OF “STAPISTICS, REMARKS, ANDYCONCLUSSORNG 


ave Giustcributlon of the statistics Cy and Ry are pre- 


sented for the first time in Appendix A and B. Some 
critical points for various levels of significance are 
presented in Appendix C for Due Cyr and Rv: It 1s seen 
for sample sizes greater than sixty all three statistics 
have essentially the same critical points for alpha = 
oc, kee 5, 4025, and .01. 

From lemma two it is known that Ry is stochastically 


smaller than D,, and Appendix C indicates that C,, is also 


N N 


stochastically smaller than D,, for the alpha values shown 


N 


under H Such results are important in developing smaller 
confidence bands for distributions [Ref. 7]. 

It has been noted that the asymptotic formula of Kol- 
mogorov 1s correct (to three decimal places) if N>60 for all 
three statistics. The tables indicate that the limiting 


distribution is reached by D,, earlier (1.e., N>30) for the 


N 


alpha values listed. It can be seen in Appendix C that for 


alpha >.05 the distribution of D,.. approaches the asymtote 


N 


from above whereas both Cn and Ry approach it from below for 


all listed values of alpha. Also, Cy and Ry are in essen- 
tial agreement for sample size greater than fifteen. 

Other statistics can be investigated empirically and 
analytically using Durbin's paper. An interesting area for 
investigation would be to relate the power functions ,of the 


test statistics to the parameters (a,b,c). 
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SYMPTOTIC ESTIMATION 
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COMPUTER OUTPUT 


SAMPLE PROGRAM 
THE D SATISTIC. FOR N=10 


[A= 1 IB= 1 IC= O NNA= 10 PROB. = 020003628 
IA= 2 IB= 2 IC= O NNN= 10 PROBe = 042512797 
[A= 3 IB= 3 IC= O NNA= 10 PROB. = 027294613 
[A= 4 IB= 4 IC= O NNN= 10 PROB. = 0.9410061 
TA= 5 IB= 5 IC= O NNN= 1O PROBe = 029922168 
TA= 6 TP= 6 IC= QO NNN= 10 PROBe = 029994260 
tA4= 7 IR= 7 IC= O NNN= 10 PROBe = 029999743 
T4= 8 IB= 8 IC= O NNN= 10 PROBe = C29999936 
[A= 9 IB= 9 IC= O NNN= 10 PROB. = 029999938 


CUT OF RANGE IA= 10 I[B= 10 IC= O NNN= 10 


THE ABOVE IS AN EXAMPLE OF ONE OF THE ERROR MESSAGES 
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